retrieval task
- Law (1.00)
- Information Technology > Security & Privacy (0.94)
- Government (0.68)
IMPACT: A Large-scale Integrated Multimodal Patent Analysis and Creation Dataset for Design Patents
Our dataset includes half a million design patents comprising 3.61 million figures along with captions from patents granted by the United States Patent and Trademark Office (USPTO) over a 16-year period from 2007 to 2022. We incorporate the metadata of each patent application with elaborate captions that are coherent with multiple viewpoints of designs.
- North America > United States > Illinois > Cook County > Chicago (0.04)
- North America > United States > Illinois > Champaign County > Urbana (0.04)
IDEA: An Invariant Perspective for Efficient Domain Adaptive Image Retrieval
More importantly, we employ a generative model for synthetic samples to simulate the intervention of various non-causal effects, thereby minimizing their impact on hash codes for domain invariance. Comprehensive experiments conducted on benchmark datasets confirm the superior performance of our proposed IDEA compared to a variety of competitive baselines.
- North America > United States > California > Los Angeles County > Los Angeles (0.14)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Europe > Greece (0.04)
- Asia > China > Heilongjiang Province > Daqing (0.04)
- Research Report > Promising Solution (0.67)
- Research Report > New Finding (0.67)
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- Europe > Switzerland > Zürich > Zürich (0.14)
- North America > Canada > British Columbia > Vancouver (0.04)
- (13 more...)
- Information Technology > Artificial Intelligence > Vision (1.00)
- Information Technology > Sensing and Signal Processing > Image Processing (0.93)
- Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.68)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)
- Information Technology > Information Management > Search (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.76)
- Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.48)
- Information Technology > Artificial Intelligence > Natural Language > Information Retrieval (0.47)
- North America > United States (0.14)
- North America > Dominican Republic (0.04)
Empowering Visible-Infrared Person Re-Identification with Large Foundation Models
Visible-Infrared Person Re-identification (VI-ReID) is a challenging cross-modal retrieval task due to significant modality differences, primarily resulting from the absence of color information in the infrared modality. The development of large foundation models like Large Language Models (LLMs) and Vision Language Models (VLMs) motivates us to explore a feasible solution to empower VI-ReID with off-the-shelf large foundation models. To this end, we propose a novel Text-enhanced VI-ReID framework driven by Large Foundation Models (TVI-LFM). The core idea is to enrich the representation of the infrared modality with textual descriptions automatically generated by VLMs. Specifically, we incorporate a pre-trained VLM to extract textual features from texts generated by VLM and augmented by LLM, and incrementally fine-tune the text encoder to minimize the domain gap between generated texts and original visual modalities. Meanwhile, to enhance the infrared modality with extracted textual representations, we leverage modality alignment capabilities of VLMs and VLM-generated feature-level filters.
Variational Interaction Information Maximization for Cross-domain Disentanglement
Cross-domain disentanglement is the problem of learning representations partitioned into domain-invariant and domain-specific representations, which is a key to successful domain transfer or measuring semantic distance between two domains. Grounded in information theory, we cast the simultaneous learning of domain-invariant and domain-specific representations as a joint objective of multiple information constraints, which does not require adversarial training or gradient reversal layers. We derive a tractable bound of the objective and propose a generative model named Interaction Information Auto-Encoder (IIAE). Our approach reveals insights on the desirable representation for cross-domain disentanglement and its connection to Variational Auto-Encoder (VAE). We demonstrate the validity of our model in the image-to-image translation and the cross-domain retrieval tasks. We further show that our model achieves the state-of-the-art performance in the zero-shot sketch based image retrieval task, even without external knowledge.